Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Poster Categories
Poster Schedule
Preparing your Poster - Information and Poster Size
How to mount your poster
Print your poster in Basel

View Posters By Category

Session A: (July 22 and July 23)
Session B: (July 24 and July 25)

Presentation Schedule for July 22, 6:00 pm – 8:00 pm

Presentation Schedule for July 23, 6:00 pm – 8:00 pm

Presentation Schedule for July 24, 6:00 pm – 8:00 pm

Session A Poster Set-up and Dismantle
Session A Posters set up: Monday, July 22 between 7:30 am - 10:00 am
Session A Posters should be removed at 8:00 pm, Tuesday, July 23.

Session B Poster Set-up and Dismantle
Session B Posters set up: Wednesday, July 24 between 7:30 am - 10:00 am
Session B Posters should be removed at 2:00 pm, Thursday, July 25.

T-01: MiR-205-5p and miR-342-3p cooperate in the repression of the E2F1 transcription factor in the context of anticancer chemotherapy resistance
COSI: RNA COSI
  • Xin Lai, Universitätsklinikum Erlangen, Germany
  • Julio Vera, Universitätsklinikum Erlangen, Germany

Short Abstract: High rates of lethal outcome in tumour metastasis are associated with the acquisition of chemoresistance. Several clinical studies indicate that E2F1 overexpression culminates in unfavourable prognosis and chemoresistance in patients. Thus, fine-tuning the expression of E2F1 could be a promising approach for treating patients showing chemoresistance. We integrated bioinformatics, structural and kinetic modelling, and experiments to study cooperative regulation of E2F1 by microRNAs in the context of chemoresistance. We showed that an enhanced E2F1 repression efficiency can be achieved in chemoresistant tumour cells through two cooperating microRNAs. We then employed molecular dynamics simulations to show that miR-205-5p and miR-342-3p can form the most stable triplex with E2F1 mRNA. A mathematical model simulating the E2F1 regulation by the cooperative microRNAs predicted enhanced E2F1 repression, a feature that was verified by in vitro experiments. Finally, we integrated this cooperative microRNA regulation into a more comprehensive network to account for E2F1-related chemoresistance in tumour cells. The network model simulations and experimental data indicate the ability of enhanced expression of both miR-205-5p and miR-342-3p to decrease tumour chemoresistance by cooperatively repressing E2F1. Our results suggest that pairs of cooperating microRNAs could be used as potential RNA therapeutics to reduce E2F1-related chemoresistance.

T-02: Comparative transcriptome-wide analysis of Methicillin-Resistant and Methicillin-sensitive Staphylococcus aureus for identification of antibiotic resistance mecA gene
COSI: RNA COSI
  • Sushmita Dhusia, Christian college of nursing, SHUATS, India
  • Ahsan Rizvi, Institute of Human Genetics, CNRS, France
  • Kalyani Dhusia, Michigan State University, United States

Short Abstract: Virulent Staphylococcus aureus strains account for most of the nosocomial and community-onset infections through the production of toxins. Several studies have been conducted to check the common antibiotics resistance of methicillin-resistant and methicillin-sensitive strains of S. aureus (M-R/S-SA) and their associated gene expression profile. Comparative differential gene expression profile of MRSA and the reference MSSA strain was performed using Illumina-based RNA-seq data analysis using DEG-seq2 from R-packages. In the present study, antibiotic resistance and virulence of MRSA and methicillin-sensitive S. aureus were analysed with differentially expressed mecA gene by screening with antibiotics candidates of the fifth generation. The differential gene expression was calculated and normalized with the TPM method of countdata using R-packages. Molecular docking and simulation studies were further performed to determine potential inhibitors among beta-lactam antibiotics against Penicillin-binding 2a protein (conferred by mec A). With RMSD of 3.65Ǻ indicating the maximum compactness with PBP-2a, when bound to Piperacillin. Collectively, gene expression profile including several virulence and drug-resistance factors confirmed the unique and highly virulent determinants of the MRSA. Efficacy of predicted inhibitors could be improved and better candidates can be designed based on their initial binding affinity to PBP-2a in wet lab studies.

T-03: Identification of CRISPR arrays using machine learning approach
COSI: RNA COSI
  • Alexander Mitrofanov, University of Freiburg, Germany
  • Omer S. Alkhnbashi, University of Freiburg, Germany
  • Kira S. Makarova, National Center for Biotechnology Information, National Library of Medicine, United States
  • Eugene Koonin, National Center for Biotechnology Information, National Library of Medicine, United States
  • Rolf Backofen, Albert-Ludwigs-University Freiburg, Germany

Short Abstract: The CRISPR-Cas systems degrade foreign genetic elements and are widespread in archaea and bacteria. CRISPR-Cas immune systems have a central, RNA-based component. Several bioinformatics tools have been developed to detect CRISPR arrays based solely on DNA sequences, but all these tools employ the same strategy of looking for repetitive patterns, which might correspond to CRISPR array repeats. The identified patterns are evaluated using a fixed, built-in scoring function, and arrays exceeding a cut-off value are reported. Here, with our CRISPR detection tool, we instead introduce a data-driven approach which uses machine learning to detect and differentiate true CRISPR arrays from false ones based on several features. Generally, our method performs three steps: detection, feature extraction and classification based on a manually curated set of positive and negative examples for CRISPR arrays. We demonstrate that our approach is not only capable of identifying previously annotated CRISPR arrays, but also of predicting novel CRISPR array candidates. Finally, in contrast to the existing tools, our approach not only provides the user with the basic statistics about the identified CRISPR arrays but also produces a certainty score as an easily understandable measure of a given genomic region’s likelihood to be a CRISPR array.

T-04: Elucidation of novel therapeutic targets for acute myeloid leukemias with RUNX1-RUNX1T1 fusion
COSI: RNA COSI
  • Sejong Chun, Chonnam National University, South Korea
  • Jae Won Yun, Samsung Medical Center, South Korea
  • Yoon Kyung Bae, Samsung Advanced Institute for Health Science and Technology, South Korea
  • Kyeung Min Joo, Samsung Advanced Institute for Health Science and Technology, South Korea
  • Woong-Yang Park, Samsung Genome Institute, Samsung Medical Center, Gangnam-gu, Seoul 06351, South Korea

Short Abstract: The RUNX1-RUNX1T1 fusion is a frequent chromosomal alteration in acute myeloid leukemias (AMLs). Although RUNX1-RUNX1T1 fusion protein has pivotal roles in the development of AMLs with the fusion, RUNX1-RUNX1T1 fusion protein is difficult to target since it lacks kinase activities. Here, we used a sophisticated bioinformatic tool to elucidate targetable signaling pathways in AMLs with RUNX1-RUNX1T1 fusion. After analysis of 93 AMLs from the TCGA database, the expressions of 293 genes were correlated with the expression of the RUNX1-RUNX1T1 fusion gene. Based on the 293 genes, the cyclooxygenase (COX), vascular endothelial growth factor receptor (VEGFR), platelet-derived growth factor receptor (PDGFR), and fibroblast growth factor receptor (FGFR) pathways are predicted to be specifically activated in AMLs with RUNX1-RUNX1T1 fusion. Moreover, the in vitro proliferation of AML cells with RUNX1-RUNX1T1 fusion decreased significantly more than that of AML cells without the fusion when the pathways were inhibited pharmacologically. The results indicate that novel targetable signaling pathways could be identified by the analysis of the gene expression features of AMLs with non-targetable genetic alterations. The elucidation of specific molecular targets for AMLs that have a specific genetic alteration would promote personalized treatment of AMLs and improve treatment outcomes for AML patients in clinic.

T-05: Evaluating the impact of RNA purification kit and blood collection tube in the extracellular RNA quality control study – important considerations for liquid biopsies
COSI: RNA COSI
  • exRNAQC Consortium
  • Jasper Anckaert, Ghent University, Belgium
  • Francisco Avila Cobos, Ghent University, Belgium
  • Anneleen Decock, Ghent University, Belgium
  • Jill Deleu, Ghent University, Belgium
  • Olivier De Wever, Ghent University, Belgium
  • Bert Dhondt, Ghent University, Belgium
  • Celine Everaert, Ghent University, Belgium
  • Carolina Fierro, Biogazelle University, Belgium
  • Hetty Helsmoortel, Ghent University, Belgium
  • An Hendrix, Ghent University, Belgium
  • Eva Hulstaert, Ghent University, Belgium
  • Pieter Mestdagh, Ghent University, Belgium
  • Annelien Morlion, Ghent University, Belgium
  • Nele Nijs, Biogazelle, Belgium
  • Justine Nuytens, Ghent University, Belgium
  • Annouck Philippron, Ghent University, Belgium
  • Kathleen Schoofs, Ghent University, Belgium
  • Gary Schroth, Illumina, Belgium
  • Eveline Vanden Eynde, Ghent University, Belgium
  • Céleste Van der Schueren, Ghent University, Belgium
  • Jo Vandesompele, Ghent University, Belgium
  • Ruben Van Paemel, Ghent University, Belgium
  • Kimberly Verniers, Ghent University, Belgium
  • Nurten Yigit, Ghent University, Belgium

Short Abstract: In search of easily accessible biomarkers, extracellular RNAs (exRNAs) have emerged as potential candidates. Unfortunately, exRNA quantification is influenced by many pre-analytical variables and a comprehensive quality control study for blood-based liquid biopsies, evaluating pre-analytical variables in a controlled and systematic manner, is currently lacking. Therefore, we initiated the exRNA quality control (exRNAQC) study. We evaluate the effect of the type of blood collection tube (n=10), time between blood draw and plasma preparation (n=3), centrifugation speed during plasma preparation (n=5), input volume and RNA purification method (n=8). The impact of these factors is assessed by unbiased transcriptome exRNA profiling of all microRNAs and messenger RNAs from healthy donors’ plasma using established RNA-sequencing workflows. In the first phase of our study, we observed large differences in RNA purification kit performance in terms of reproducibility, yield and transcriptome complexity. We are currently analyzing the blood collection tube exRNA profiles. Once all pre-analytical variables are evaluated separately, we will integrate our findings into a full factorial experiment and plan dedicated follow-up experiments to validate our findings. Using this systematic approach, we aim to develop quality control metrics and guidelines for the study of exRNA in order to facilitate further progress in the field.

T-06: RNA 2D/3D structure prediction with a consensus of contact methods
COSI: RNA COSI
  • Russell Hamilton, University of Cambridge, United Kingdom
  • Anne Ferguson-Smith, University of Cambridge, United Kingdom
  • William Taylor, Francis Crick Institute, United Kingdom

Short Abstract: RNA structures are formed from canonical Watson-Crick base-pairings (A:U, C:G, G:U) forming structural elements such as stem-loops. However, non-canonical base-pairings mediated through hoogenstein and sugar edges of the nucleotides permit many more base-pairing combinations, enabling more elaborate 3D structures. Motifs such as the G-quadruplex, i-motif and kink-turn have been suggested to mediate the translation of the RNA, however the roles these motifs remain enigmatic. Therefore, accurate prediction of these motifs is essential for furthering our understanding of the non-canonical base-paired motif functions. RNA base-pairings predicted using correlated mutation approaches can provide powerful restrictions on the 2D/3D conformation of RNA motifs. However, we have previously shown predicted contacts can be too erratic to provide generalised RNA 3D predictions. To further assess this limitation, we evaluate contact predictions made by four methods (CCMpred, R-Scape, Plmc, pySCA) and benchmark their individual performance against databases of RNAs with known structures (Rfam & RNA-puzzles). We then produce a consensus of restraints, supplemented with minimum-free energy calculations taking into account base stacking. Performance of the consensus restraints is assessed by their ability to accurately predict 3D structures. We discuss strengths and weaknesses of each of the methods and present the results as a dynamic web resource.

T-07: Integrative analysis reveals common microRNAs regulating similar network of pathways dysregulated across multiple carcinomas
COSI: RNA COSI
  • Divya Niveditha, Birla Institute of Technology and Science, India
  • Shibasish Chowdhury, Birla Institute of Technology and Science, India

Short Abstract: Cancer is a complex disease whose global burden has made it absolute necessity for its early detection. In this regard, the short non-coding RNA molecules- microRNAs (miRNAs) have shown great promise due to their availability in circulating fluids facilitating non-invasive detection of cancer. In this study, an in silico comparative analysis was performed to identify specific signature miRNAs dys-regulated across multiple carcinomas. The miRNA-seq data of cancer patient was obtained from GDC portal and their differential expressions along with the pathways regulated were analyzed. Our studies show twelve miRNAs commonly dys-regulated across seven different cancer types. Interestingly, four of those miRNAs (hsa-mir-210, hsa-mir-19a, hsa-mir-7 and hsa-mir-3662) are already reported as circulatory miRNAs (circRNAs); while, the miR-183 cluster along with hsa-mir-93 have been found to be incorporated in exosomes signifying the importance of the identified miRNAs for their use as prospective, non-invasive bio-markers. Furthermore, we identified 6 common miRNAs which are being reported for the first time as cancer biomarkers. Our data is of significance because we not only identify a set of common miRNAs de-regulated in multiple cancers but also highlight similar pathways regulated by them, which might facilitate development of future non-invasive biomarkers conducive for early detection of cancers.

T-08: Innovative advanced computational solutions for improved gene and transcript level analysis using RNA-seq
COSI: RNA COSI
  • Runxuan Zhang, The James Hutton Institute, United Kingdom
  • Wenbin Guo, University of Dundee, United Kingdom
  • Cristiane Calixto, University of Dundee, Brazil
  • Allan James, University of Glasgow, United Kingdom
  • Hugh Nimmo, University of Glasgow, United Kingdom
  • John Brown, University of Dundee, United Kingdom

Short Abstract: Understanding the current limitations of RNA-seq is crucial for reliable expression analysis. We have developed: 1) Novel methods to construct high-quality transcript references using RNA-seq (Zhang et al. 2017 NAR) and Iso-seq data. Comprehensive and accurate transcript references ensure the accuracy of transcript quantification which underpins research on post-transcriptional regulation (AS, APA, translation, etc). 2) A cutting-edge pipeline (3D RNA-seq) that detects differential gene expression, alternative splicing, and transcript usage. It allows simple and rapid expression/AS analysis of RNA-seq experiments by biologists with no programming skills (https://github.com/wyguo/ThreeDRNAseq). 3) A shiny app (TSIS) (Guo et al. 2017 Bioinformatics) to detect and characterize significant transcript isoform switches for time-series RNA-seq. 4) Tools that accurately identify open reading frames (ORFs) avoiding mis-annotation of ORFs found in many databases (Brown et al. 2015 Plant Cell) (Transfix) and characterize transcripts encoding protein variants, AS events, premature termination codons and nonsense-mediated decay features (Transfeature). These tools/methods enabled the large scale investigation of expression/AS in a cold time-series in Arabidopsis showing a massive and rapid AS response and identifying novel cold-responsive transcription and splicing factors regulated only by AS (Calixto et al. 2018 Plant Cell).

T-09: Integrative analysis of untranslated regions in human messenger RNAs uncovers G-quadruplexes as constrained regulatory features
COSI: RNA COSI
  • David S.M. Lee, University of Pennsylvania, United States
  • Louis R. Ghanem, Children's Hospital of Philadelphia, United States
  • Yoseph Barash, University of Pennsylvania, United States

Short Abstract: Identifying regulatory elements in the noncoding genome is a fundamental challenge in biology. G-quadruplex (G4) sequences are abundant in untranslated regions (UTRs) of human messenger RNAs, but their functional importance remains unclear. By integrating multiple sources of genetic and genomic data, we show that putative G-quadruplex forming sequences (pG4) in 5’ and 3’ UTRs are selectively constrained, and enriched for cis-eQTLs and RNA-binding protein (RBP) interactions. Using over 15,000 whole-genome sequences, we uncover patterns of selection at single-nucleotide resolution in UTR pG4s supporting their role in mediating protein-binding via secondary-structure formation. In parallel, we identify new proteins with evidence for preferential binding at pG4s from ENCODE annotations, and delineate putative regulatory networks composed of shared binding targets. Finally, by mapping variants in the NIH GWAS Catalogue and ClinVar, we find enrichment for disease-associated variation in 3’UTR pG4s. At a GWAS pG4-variant associated with hypertension in HSPB7, we uncover robust allelic imbalance in GTEx RNA-seq across multiple tissues, suggesting that changes in gene expression associated with pG4 disruption underlie the observed phenotypic association. Taken together, our results establish UTR G-quadruplexes as important cis-regulatory features, and point to a putative link between disruption within UTR pG4 and susceptibility to human disease.

T-10: miRAW: A deep learning approach to predict miRNA targets
COSI: RNA COSI
  • Siqing Liu, Department of Medical Genetics, Oslo University Hospital, Norway
  • Yafei Xing, Department of Medical Genetics, Oslo University Hospital, Norway
  • Albert Pla, Department of Medical Genetics, University of Oslo, Norway
  • Simon Rayner, Department of Medical Genetics, Oslo University Hospital & Hybrid Technology Hub, University of Oslo, Norway

Short Abstract: Previously, we developed miRAW as a miRNA:mRNA target prediction tool using deep learning. For training, rather than incorporates any human-crafted descriptors (which are potential sources of imprecision), miRAW investigates the entire miRNA and mRNA sequences in order to learn uninhibited feature descriptors related to the targeting process. An additional factor that helps miRAW to achieve superior performance compared to other miRNA targeting tools, is the meticulous preparation of the training data and the consideration of an extended seed region. Around 17,000 validated miRNA:mRNA exact target sites were extracted by cross referencing more than 150,000 experimentally validated homo sapiens miRNA:gene targets with different CLIP-based datasets. In order to (i) identify potential factors that influence the binding process, and (ii) improve the user interface to miRAW, pipelines for filtering prediction results, visualising bindings between miRNA:mRNA targets and pairing at mutation locations have been developed. Finally, to improve prediction accuracy, miRAW has been applied to specific pathway predictions, e.g., miRNAs targeting the F5 gene. Predictions have then been validated by experiment and results fed back into the training model. A high throughput validation experimental system is in development which can use this feedback approach in an iterative manner.

T-11: Predicting canonical and non-canonical box C/D snoRNA interactions using machine learning
COSI: RNA COSI
  • Gabrielle Deschamps-Francoeur, Université de Sherbrooke, Canada
  • Michelle Scott, University of Sherbrooke, Canada

Short Abstract: Small nucleolar RNAs (snoRNAs) are small non-coding RNAs separated in two families, the C/D and the H/ACA boxes, that are respectively known to guide methylation and pseudouridylation of ribosomal and small nuclear RNAs. These functions are well characterized and require an interaction with specific regions of the snoRNAs. In humans, however, some snoRNAs do not have known canonical targets. Also, some snoRNAs exhibit non-canonical functions, such as regulation of alternative splicing and of mRNA stability, the deregulation of which has been implicated in diseases such as Prader-Willi syndrome and cancer. New methodologies were developed allowing the high-throughput detection of RNA-RNA interactions. The study of these datasets revealed that the canonical interactions only account for 5% of all snoRNA interactions. The aim of this project is to develop a tool to predict snoRNA interactions, both canonical and non-canonical, using an artificial neural network. The datasets obtained using high-throughput interactions identification methodologies were analyzed and compared, together with a curation of the literature. The interaction sequences were fed to the algorithm resulting in an accuracy of 0.78. With this tool, we will be able to predict novel potential snoRNA interactions, shedding light on their non-canonical functions and their implication in different diseases.

T-12: High throughput experimental method improves deep learning prediction model of miRNA targets
COSI: RNA COSI
  • Siqing Liu, Department of Medical Genetics, Oslo University Hospital, Norway
  • Endalkachew Ashenafi Alemu, Department of Medical Genetics, Oslo University Hospital, Norway
  • Yafei Xing, Department of Medical Genetics, Oslo University Hospital, Norway
  • Albert Pla, Department of Medical Genetics, University of Oslo, Norway
  • Simon Rayner, Department of Medical Genetics, Oslo University Hospital & Hybrid Technology Hub, University of Oslo, Norway

Short Abstract: Deep learning is commonly applied to biological problems to predict or classify outcomes. In such a scenario experimental data is used to train and test a model which is then applied to predict outcomes in new situations. miRNA target prediction is one such example. miRNA-mRNA target data in the form of HITS-CLIP and other variants are used to identify miRNA-mRNA target pairs and the data is then used to train prediction models. However, a major problem is that there is insufficient data for training and testing, only a very small fraction of target events can be sampled in cells. Also, as studies seek to confirm a target event, there is limited negative data (in our recent publication we collected 33 912 and 1 096 positive and negative experimentally verified human miRNA-mRNA interactions respectively). In our approach, we are developing an experimental method that allows direct high throughput investigation of miRNA-mRNA target space. Thus, instead of the experimental data being generated to verify a prediction, we can generate the data that can improve the performance of a deep learning prediction model, generating balanced positive and negative datasets. Our preliminary results support a proof of concept for this approach.

T-13: A landscape of circadian and ultradian alternative splicing in mammalian tissues
COSI: RNA COSI
  • Rukeia El-Athman, Institute for Theoretical Biology, Humboldt University Berlin and Charité Medical University Berlin, Germany

Short Abstract: Mounting evidence points to a role of the circadian clock in the temporal regulation of pre-mRNA splicing. To investigate whether the same gene can give rise to transcripts with divergent oscillatory patterns, we analyzed circadian and ultradian transcriptional rhythms of individual isoforms and compared them to those observed on gene-level in 12 mouse and 64 olive baboon tissues. We found various splicing-related genes with consistently 24-h rhythmic transcriptional activity across tissues and species that displayed a bimodal phase distribution. We further identified 24-h and 12-h rhythmic putative alternative splicing events in murine tissues and pairs of differentially 24-h and 12-h rhythmic splice isoforms of the same gene in baboon tissues whose expression peaked at opposing times of the day. Several of the candidate genes were associated with mRNA splicing processes, hinting at a reciprocal interplay between the observed circadian rhythmicity of splicing-related genes and time-of-day-dependent isoform production. We extended our findings by analyzing a novel dataset of two colorectal cancer cell lines in different progression stages from the same patient. Both displayed 24-h and 12-h rhythmic phase-shifted isoforms that differed between the primary tumor and the metastatic cell line, pointing to a role of rhythmic alternative splicing in tumor progression.

T-14: Dual RNA-seq provides insight into the RNA biology of the neglected intracellular human pathogen Orientia tsutsugamushi
COSI: RNA COSI
  • Bozena Mika-Gospodorz, Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research (HZI), Würzburg, Germany, Germany

Short Abstract: Intracellular infection is a complex process driven by bacteria that invade and replicate in host cells, which in turn respond to this action. Changes in gene expression of both organisms reflect this process, so exploring the transcriptomic differences can help us to understand pathogenesis and infectious diseases. Dual RNA-seq is a technique that measures transcript expression in the host-pathogen system as a whole. Specifically, dual RNA-seq captures both coding and noncoding RNA expression patterns simultaneously in intracellular bacteria and their host, thus identifying new aspects of infection. Here, we investigate the biology of the obligate intracellular pathogen, Orientia tsustsugamushi, that causes scrub typhus, a major neglected disease in south and south-east Asia. We applied dual RNA-seq to an endothelial cell line infected with two Orientia strains, Karp and UT176, which differ in virulence. This has allowed us to assay differences in the host response to infection with the two strains, and explore unusual aspects of the RNA biology of Orientia, including the expression of a two-piece tmRNA and widespread antisense regulation. Finally, applying a simple machine learning approach, we provide evidence that antisense transcription plays a major regulatory role in Orientia.

T-15: miRBaseMiner, a python package for curating miRNA annotation in miRBase
COSI: RNA COSI
  • Xiangfu Zhong, Oslo University Hospital, Norway
  • Simon Rayner, Department of Medical Genetics, Oslo University Hospital & Hybrid Technology Hub, University of Oslo, Norway

Short Abstract: MicroRNAs are small non-coding RNA molecules involved in gene regulation. For studies seeking to identify changes in miRNAs, miRBase is the standard reference source. miRBase is updated periodically, with each release including newly discovered miRNAs, modified entries and “dead” entries corresponding to deleted entries. However, the miRBase authors emphasize they only provide “minimal gate-keeping” to ensure annotation quality. While efforts have been made to provide a measure of the variation by identifying “incorrect” annotation, these depend on the definition of a miRNA and the miRBase version, leading to significant variation in identification of “reliable” entries. Thus, there is no straightforward way to explore miRBase annotation. We have developed a python package, miRBaseMiner, for investigating miRBase annotation and generating custom annotation sets. We characterized each miRBase release from v9.2 to v22 and found entries with: (1) identical sequences; (2) multiple genome locations; (3) reverse complementarity; (4) 3' poly(A) ends; (5) sharing high similarity in sequences. We also found pre-miRNAs with extremely low stability; As each of these factors can impact identification of dysregulated features and subsequent clinical or biological conclusions, miRBaseMiner is a valuable resource for any user using miRBase as reference source. miRBaseMiner is freely available on github and pypi.

T-16: Single cell transcriptomics of liver-expressed long non-coding RNAs
COSI: RNA COSI
  • Kritika Karri, Boston University, United States

Short Abstract: The liver exhibits striking metabolic zonation, with distinct functions and expression patterns for hepatocytes proximal to the portal vein compared to cells surrounding the central vein. However, little is known about the zonation of liver lncRNAs, which have diverse functions and regulatory activities. We analyzed mouse liver single cell RNA-seq data (Smart-seq2, Drop-Seq) to detect protein-coding mRNAs and lncRNAs at single cell resolution. 4,500 liver-expressed lncRNAs were detectable at >4 transcripts per million reads. Seurat analysis identified 130 lncRNAs as markers specific to individual liver cell types: hepatocytes, endothelial cells, Kupffer cells, B cells and NK cells. A further 28 lncRNAs showed zonated hepatocyte expression patterns based on established landmark genes. We also identified lncRNAs showing sex-biased expression. In addition, 234 of 412 lncRNAs responsive to the non-genotoxic hepatocarcinogen TCPOBOP were detected; these include lncRNAs specific to non-hepatocyte clusters and lncRNAs showing zonated expression in hepatocytes. These findings are being used to identify protein-coding gene targets of zone co-localized regulatory lncRNAs and obtain insight into the zone-based biological pathways they may regulate. These analyses also demonstrate that xenobiotic exposure can dysregulated lncRNAs expressed in multiple liver cell types.

T-17: Benchmarking the impact of data transformation, pre-processing and choice of method in the computational deconvolution of transcriptomics data
COSI: RNA COSI
  • Francisco Avila Cobos, Ghent University, Belgium
  • José Alquicira-Hernandez, Garvan Institute of Medical Research, Australia
  • Jo Vandesompele, Ghent University, Belgium
  • Joseph Powell, Garvan Institute of Medical Research, Australia
  • Pieter Mestdagh, Ghent University, Belgium
  • Katleen De Preter, Ghent University, Belgium

Short Abstract: Many computational methods to infer proportions of individual cell types from bulk transcriptomics data have been developed (= computational deconvolution). Attempts comparing these methods revealed that the choice of reference signatures is far more important than the method itself. However, a thorough evaluation of the combined impact of data transformation, pre-processing and methodology on the results is still lacking. Using single-cell RNA-sequencing (scRNA-seq) data from human pancreas and PBMCs, we artificially generated hundreds of pseudo-bulk mixtures with varying number of cells and cell types in known proportions, allowing the evaluation of the combined impact on the deconvolution results. Among the methods to perform deconvolution of bulk RNA-seq data we included MuSiC, a method designed to infer the cell type composition of bulk data using scRNA-seq data as reference. Moreover, since most methods require an additional reference matrix containing cell-type specific expression values, we assessed the effect of removing cell types from the reference that were actually present in the mixtures. Further in-depth analyses are currently ongoing. (*) Equal contribution

T-18: Association of Cis-regulatory G-quadruplex Motifs With Splice Sites in the Human Genome
COSI: RNA COSI
  • Vanesa Getseva, Ramapo College of New Jersey, United States
  • Scott Frees, Ramapo College of New Jersey, United States
  • Paramjeet Bagga, Ramapo College of New Jersey, United States

Short Abstract: Our lab has been interested in investigating the role of cis-regulatory motifs, such as Quadruplex forming G-Rich Sequences (QGRS) in RNA processing. For the current project, we have focused on computationally identifying QGRS distribution patterns near splice sites in the human protein-coding genome, with a goal to investigate their role in regulated splicing. We have developed scripts in Python3 and C++, based on our previously established QGRS Mapper program, to map QGRS motifs. Our analysis discovered preferential association of QGRS motifs with splice sites in exons and introns. We also observed differential QGRS distribution patterns between 5’ and 3’ splice sites. RNA QGRS motifs in the vicinity of specific splice sites may be involved in modulating splicing via interactions with regulatory proteins that bind G-rich sequences and influence splicing events. Furthermore, QGRS motifs were significantly more likely to overlap the alternatively spliced sites as compared to the constitutive sites, thereby suggesting their role in regulated alternative processing. Our data suggest that QGRS motifs are likely to be involved in influencing splicing of the human protein coding genes on a genomic scale.

T-19: Pan-transcriptomic analysis identified common differentially expressed genes in response to polymyxins in Acinetobacter baumannii
COSI: RNA COSI
  • Mengyao Li, Monash University, Australia
  • Yan Zhu, Monash University, Australia
  • Jian Li, Monash University, Australia

Short Abstract: Polymyxins are the last-line antibiotics to combat multidrug-resistant Acinetobacter baumannii. The increasingly reported polymyxin resistance urges the development of novel antimicrobial therapies. However, it is largely unclear how A. baumannii responds to polymyxin treatment. Our study aimed to conduct pan-transcriptomic analysis in A. baumannii to determine their common gene expression pattern. RNA-Seq raw reads of five A. baumannii strains were collected from Gene Expression Omnibus and aligned to reference genomes by SubRead. Read counts were summarised by featureCounts and the differentially expressed genes were determined by edgeR. Orthologs were determined for the five strains using Roary and functional enrichment was conducted to determine significantly perturbed pathways. Overall, 2,822 orthologs were identified across five strains. After 0.75  MIC or 2 mg/L polymyxin treatments for 15 min, 41 genes were commonly up-regulated, including those related to lipoprotein and phospholipid trafficking, BaeSR two-component system, efflux pump and poly-N-acetylglucosamine biosynthesis; six genes were commonly down-regulated, three of them were on fatty acid biosynthesis. This pan-transcriptomic study suggested that in A. baumannii polymyxins rapidly damage bacterial membrane integrity, induce the expression of efflux pump, and supress fatty acid biosynthesis. These findings provide important mechanistic insights into optimising novel polymyxin therapies against MDR A. baumannii.

T-20: Transcriptional Landscape of Human Progenitor Cell Populations
COSI: RNA COSI
  • Maina Bitar, QIMR Berghofer, Australia
  • Isabela Pimentel de Almeida, Universidade de Sao Paulo and QIMR Berghofer, Brazil
  • Elizabeth O'Brien, QIMR Berghofer, Australia
  • Guy Barry, QIMR Berghofer, Australia

Short Abstract: Somatic progenitor cells are crucial for human tissue development and maintenance. These unspecialized often tissue-specific cells have recently begun to be used clinically for organ repair and regeneration. As unique cell populations, their molecular composition is of great interest and panoramic views of their transcriptional landscape can support further developments in the field. Here we investigate for the first time a set of five distinct human progenitor cell types using state of the art Bioinformatics tools. We uncover similarities and differences between the cell types based on transcript expression profiles revealed by RNA-Seq. Our analyses suggest very high overall similarity among these progenitor cell populations at the transcriptome level. Although these cell populations have high transcriptomic similarity, we explored the differences between them and found unique transcriptional signatures and cell type-specific coding and non-coding transcripts. This study shows the high transcriptomic similarity of progenitor cells but that a minority of specialized and uniquely expressed transcripts are able to differentiate each cell type. Functional exploration into the transcriptomic similarities and differences between progenitor cell provided essential knowledge about unique cellular markers, and shared and distinct functions required for progenitor cell differentiation into defined cell populations that constitute our tissues and organs.

T-21: PolyA-miner: Accurate Estimation of Alternative Poly-Adenylation from 3’Seq data using Non-negative matrix factorization and Vector algebra
COSI: RNA COSI
  • Hari Krishna Yalamanchili, Baylor College of Medicine, United States
  • Zhandong Liu, Baylor College of Medicine, United States

Short Abstract: More than half of human genes exercise alternative polyadenylation (APA) to generate different mRNA transcripts. Increasing significance of APA in disease context propelled the development of several 3’ sequencing techniques. In spite of this there are no computational tools that are designed precisely for 3’seq data. Here we present PolyA-miner, a novel alternative polyadenylation quantification algorithm based on Non-negative matrix factorization (NMF). A gene is abstracted as a matrix of polyadenylation sites and an iterative Consensus NMF is executed to extract a robust dichotomization of samples. Statistical significance is evaluated as the goodness-of-fit of the dichotomization over a null model. We evaluated PolyA-miner on Glioblastoma cell line PAC-seq data. Strikingly, 1418 genes with APA changes are identified in contrast to 695 genes reported in the original study. In addition, 157 genes with novel polyadenylation sites are identified . PolyA-miner is the first computational tool specifically designed for 3’Seq data. Iterative Consensus NMF makes it less susceptible to sample variation. It can effectively identify novel APA sites and account all APA changes including non-proximal to non-distal changes. With the emerging importance of APA in human diseases, PolyA-miner can significantly accelerate analysis and help decoding the missing pieces of underlying APA dynamics.

T-22: Genome wide small RNA profiling reveals dynamic transcriptome adaptation in Paramecium
COSI: RNA COSI
  • Sivarajan Karunanithi, Goethe University, Germany
  • Vidya Oruganti, MPI for Plant Breeding Research, Germany
  • Simone Marker, Saarland University, Germany
  • Angela M Rodriguez-Viana, Saarland University, Germany
  • Franziska Drews, Saarland University, Germany
  • Marcello Pirritano, Saarland University, Germany
  • Karl Nordstroem, Saarland Univerisy, Germany
  • Martin Simon, Saarland University, Germany
  • Marcel Schulz, Goethe University, Germany

Short Abstract: Exogenous RNAi pathways regulating gene expression during vegetative growth of the unicellular model ciliate Paramecium, has been widely studied. However, the involvement of RNAi in endogenous transcriptome regulation, and environmental adaptation is unknown. We developed a pipeline to profile the genome-wide endogenous siRNAs in different transcriptomic states. We characterized 2,602 siRNA producing clusters (SRCs). Our data shows no evidence for the production of miRNAs from these SRCs, unlike in other species. Notably, most SRCs were found to overlap with coding genes. Also, some SRCs showed siRNA phasing along the entire ORF. These observations, along with exon-exon junction analysis, suggest that the mRNA transcript is a potential source for siRNAs. We identified a group of 915 genes, overlapping SRCs, which were highly expressed in all transcriptomic states. However, an integrative analysis of siRNA abundance and gene expression revealed both negative as well as positive associations. Two RNA dependent RNA Polymerase mutants, RDR1 and RDR2, show a drastic loss of siRNAs especially in phased SRCs accompanied with increased mRNA levels. Most SRCs shows dependency to both RDRs, like the primary siRNAs in the RNAi against exogenous RNA, suggesting mechanistic overlaps between exogenous and endogenous RNAi.

T-23: DTUrtle - an easily accessible Differential Transcript Usage analysis pipeline
COSI: RNA COSI
  • Tobias Tekath, Institute of Medical Informatics - University of Muenster, Germany

Short Abstract: Motivation: The common differential gene expression (DEG) analysis of RNA-seq data uses gene quantification counts to detect significant expression changes. Since the rise of fast and reliable transcriptomic quantifiers like Salmon, Kallisto and RSEM, these gene counts are often the sum of the individual transcript counts of the genes transcripts. Complementary to the DEG-analysis, proportional changes in the transcript composition of a gene would be of great interest for many research questions, such as analysis of differential splicing. Results: We propose a state-of-the-art RNA-seq pipeline DTUrtle to utilize these transcript counts and analyze the differential transcript usage (DTU). The pipeline combines the pre-processing, quantification and DTU-calling in a user-friendly way and integrates the results in a search- and filterable overview table. DTUrtle focuses on statistical validity and introduces some additional filtering steps to improve the DTU-calling performance as well as the run time for big datasets. The results overview table integrates additional information and visualizations which enables the user to inspect the elements of interest in further detail. DTUrtle has been successfully applied to a dataset of 145 human samples. Conclusion: DTUrtle enables researchers to perform DTU-analysis in a comfortable way and facilitates an in-depth inspection of the results.

T-24: RNA sequence-structure alignment for comparing pseudoknot structures and virus terminals
COSI: RNA COSI
  • Gianvito Urgese, Politecnico di Torino, Italy
  • Jörg Winkler, Algorithmic Bioinformatics, Institute of Computer Science, Freie Universität Berlin, Germany
  • Elisa Ficarra, Politecnico di Torino, Italy
  • Knut Reinert, Algorithmic Bioinformatics, Institute of Computer Science, Freie Universität Berlin, Germany

Short Abstract: Unlike DNA, RNA can fold into intricate structures that for non-coding RNAs and RNA-viruses seems to be essential for their functionality. Standard alignment algorithms, capable of considering the nucleotide sequence only, are not adequate to support biologist in the task of comparing RNA fragments with the databases of RNAs for which secondary-structures and functions are known. The spatial folding of RNA largely determines its function, and therefore RNA alignment algorithms have to take structural information into account during the alignment process. In this work, we evaluate the alignment performances of an improved and parallelised version of the LaRA program implemented using the SeqAn C++ library. This tool, unlike many DP-based algorithms, can natively handle arbitrary pseudoknots and, if coupled with multiple sequence alignment algorithms such as T-Coffee and MAFFT, can produce multiple sequence-structure alignments (MSSA) of RNA sequences with relative consensus structure. We tested the capability of our LaRA-based MSSA in generating reliable consensus structures using two sets of clusters of RNA sequences identified in the literature. The first collect sequences proved to have pseudoknots in their secondary structure. Whereas, the second dataset collect fragments of RNA viruses clustered per family and sharing a conserved structure but low sequence identity.

T-25: Gene fusions as prognostic markers for Prostate Cancer
COSI: RNA COSI
  • Carolin Schimmelpfennig, Fraunhofer Institute of Cell Therapy and Immunology IZI, Germany
  • Markus Kreuz, Fraunhofer Institute of Cell Therapy and Immunology IZI; Medical Faculty, University of Leipzig, Germany
  • Susanne Füssel, University Hospital and Faculty of Medicine, Technical University of Dresden, Germany
  • Manfred Wirth, University Hospital and Faculty of Medicine, Technical University of Dresden, Germany
  • Friedemann Horn, Fraunhofer Institute of Cell Therapy and Immunology IZI; Medical Faculty, University of Leipzig, Germany
  • Kristin Reiche, Fraunhofer Institute of Cell Therapy and Immunology IZI; Medical Faculty, University of Leipzig, Germany

Short Abstract: Background: Prostate cancer (PCa) is the most prevalent cancer disease and the third most common cancer-related cause of death for European men. Clinical behavior of localized PCa is highly variable ranging from aggressive cancer leading to death of disease (DoD) to indolent cancers that may be safely observed. In order to avoid overtreatment, there is a high need for novel biomarkers to support clinical decision making for PCa patients. Methods: 40 tissue specimens of PCa patients with long-term follow up and 8 controls were assessed by transcriptome-wide next-generation sequencing. After primary analysis and quality filtering, we used the software FusionCatcher to detect gene fusions. Additional filter steps utilizing biological and technical criteria were developed to reduce false-positive fusions. Results: Among others we detected well-known gene fusions associated with PCa development such as TMPRSS2-ERG, SLC45A2-AMACR and SCHLAP1-UBE2E3. For samples affected by gene fusions involving ERG we observed a massive over-expression of ERG resulting in a global shift of the transcriptomic landscape. To assess the prognostic potential of gene fusion we analysed their association with DoD and adverse pathology such as high Gleason scores or lymph node involvement.

T-26: tailfindr: Alignment-free poly(A) length measurement for Oxford Nanopore RNA and DNA sequencing
COSI: RNA COSI
  • Adnan M. Niazi, University of Bergen, Norway
  • Maximilian Krause, University of Bergen, Norway
  • Kornel Labun, University of Bergen, Norway
  • Yamila N. Torres Cleuren, University of Bergen, Norway
  • Florian Muller, University of Bergen, Norway
  • Eivind Valen, University of Bergen, Norway

Short Abstract: Polyadenylation at the 3’-end is a major regulator of messenger RNA and its length is known to affect nuclear export, stability and translation, among others. Only recently, strategies have emerged that allow for genome-wide poly(A) length assessment. These methods identify genes connected to poly(A) tail measurements indirectly by short-read alignment to genetic 3’-ends. Concurrently Oxford Nanopore Technologies (ONT) established full-length isoform RNA sequencing containing the entire poly(A) tail. However, assessing poly(A) length through basecalling has so far not been possible due the inability to resolve long homopolymeric stretches in ONT sequencing. Here we present tailfindr, an R package to estimate poly(A) tail length on ONT long-read sequencing data. tailfindr operates on unaligned, basecalled data. It measures poly(A) tail length from both native RNA and DNA sequencing, which makes poly(A) tail studies by full-length cDNA approaches possible for the first time. We assess tailfindr’s performance across different poly(A) lengths, demonstrating that tailfindr is a versatile tool providing poly(A) tail estimates across a wide range of sequencing conditions.

T-27: Circular RNA landscape in the ageing African turquoise killifish
COSI: RNA COSI
  • Franziska Metge, Max-Planck-Institute for Biology of Ageing, Germany
  • Jorge Boucas, Max-Planck-Institute for Biology of Ageing, Germany

Short Abstract: CircRNAs are a subgroup of RNAs which form a circular molecule. During the splicing process the 3’ splice donor loops back to form a covalent bond with an upstream 5’ splice acceptor instead of the downstream 5' acceptor. The majority of circRNAs are non-coding splice isoforms of protein coding genes showing tissue and time specific expression. Because circRNAs have no 5’ cap nor 3’ poly-A tail they degrade slower than their linear counterparts. Though only few studies were able to show a direct function for circRNA’s, they could be linked to regulate their host-genes expression levels. In this work we sequenced 23 samples from three tissues (brain, muscle gut) from three (two for gut) time points throughout the life of the African turquoise killifish. We are able to identify 1810 unique circRNAs, half of which are conserved among humans and mice. We show that a third of these circRNAs are shared among all three tissues, while one third is specific to brain. With this study we provide a comprehensive atlas to the circRNA landscape in the ageing African turquoise killifish, which provides a great resource to the circRNA as well as the killifish community.

T-28: Assessing the performance of coevolution-based RNA contact prediction
COSI: RNA COSI
  • Emanuel Peter, Juelich Supercomputer Center, Germany
  • Mehari Zerihun, Karlsruhe Institute of Technology, Germany
  • Alexander Schug, Juelich Supercomputer Center, Germany
  • Fabrizio Pucci, ULB, Belgium

Short Abstract: High-throughput sequencing technologies provide us an invaluable source of evolutionary information that can be used in order to improve RNA structure prediction methods. To exploit this information in the last decade many statistical methods have been developed to identify co-evolving nucleotide pairs in a multiple sequence alignment. Here we construct a curated database of about seventy RNA 3D structures from Protein Data Bank with a high resolution and with a threshold on pairwise sequence identity. On this dataset, we compare the performance of different contact prediction methods that make use of co-evolutionary information such as mean field and pseudo-likelihood maximization direct coupling analysis. Finally, we present some preliminary results about the use of machine-learning to improve coevolution-based RNA contact prediction.

T-29: Transcriptomics of cardiac biopsies reveals differences in patients with or without diagnostic parameters for heart failure with preserved ejection fraction
COSI: RNA COSI
  • Christoffer Frisk, Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Sweden
  • Sarbahis Das, Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Sweden
  • Maria J Eriksson, Karolinska University Hospital, Department of Clinical Physiology, Sweden
  • Anna Walentinsson, Translational Sciences, Cardiovascular, Renal and Metabolic Diseases, IMED Biotech Unit, AstraZeneca, Sweden
  • Matthias Corbascio, Karolinska Institutet, Department of Molecular Medicine and Surgery, Sweden
  • Camilla Hage, Karolinska Institutet, Department of Medicine, Sweden
  • Chanchal Kumar, Astra Zeneca, Department of Medicine Karolinska Institutet, Sweden
  • Michalea Asp, Science for Life Laboratory, Royal Institute of Technology, Sweden
  • Joakim Lundeberg, Science for Life Laboratory, Royal Institute of Technology, Sweden
  • Eva Maret, Karolinska University Hospital, Department of Clinical Physiology, Sweden
  • Hans Persson, Karolinska Institutet, Department of Clinical Sciences, Sweden
  • Cecilia Linde, Karolinska Institutet, Department of Clinical Sciences, Sweden
  • Bengt Persson, Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Sweden

Short Abstract: Background. Heart failure (HF) affects 2–3 % of the adult Western population and its prevalence increases, in particular the proportion of heart failure with preserved left ventricular (LV) ejection fraction (PEF). We hypothesized that patients undergoing elective coronary by pass surgery (CABG) with PEF physiology will show distinctive gene expression compared to patients with normal LV physiology. Methods. Cardiac biopsies from LV were obtained from CABG patients. Patients were divided into two groups, Normal or PEF physiology, according to echocardiography, NTproBNP levels and HF guidelines definitions. Results. Of totally 16 patients 5 were classified as PEF and 11 as Normal physiology. Utilizing principal component analysis, the samples clearly clustered into these two groups. 743 differentially expressed genes were identified and analyzed to characterize functional correlations and regulatory properties. These properties include activation of transcription factors HEY2 and KDM5A and inhibition of STAT4, SRF and TP53, where TP53 is regulating 13% of the differentially expressed genes. The top biological functions associated with down-regulated genes in PEF were cardiac muscle contraction, oxidative phosphorylation, endocytosis and matrix organization. Conclusions. This exploratory study showed that patients undergoing elective CABG with PEF physiology had distinctive gene expression compared to patients with normal physiology.

T-30: Predicting isoform transcripts: lessons from human, mouse and dog
COSI: RNA COSI
  • Nicolas Guillaudeux, Univ Rennes, Inria, CNRS, IRISA, France
  • Catherine Belleannée, Univ Rennes, Inria, CNRS, IRISA, France
  • Samuel Blanquart, Univ Rennes, Inria, CNRS, IRISA, France
  • Jean-Stéphane Varré, CRIStAL - CNRS UMR 9189 - Université Lille - INRIA Lille-Nord Europe, France

Short Abstract: Alternative transcription and alternative splicing mechanisms allow an eukaryotic gene to express a large diversity of isoform transcripts, each one made of a specific combination of genomic segments, the exons. Predicting the whole catalog of isoform transcripts that can be expressed from a gene remains difficult. We have proposed a comparative genomics method allowing to identify orthologous exons shared by a pair of orthologous genes. This method uses functional sites (start, stop codons and splice sites) known in a given source gene to transpose them, through sequence homology search, onto the sequence of an orthologous gene. The obtained gene model allows to determine whether or not the target gene can express an ortholog of a known transcript expressed by the source gene. In this work, we adapt the approach to the problem of multi-species comparison. We apply it to a set of 2,167 orthologous genes shared between human, mouse and dog, and we predict several thousand of new orthologous transcripts. We then identify a set of 135 genes sharing in all three species the same functional sites and expressing the same transcripts. Finally, we validate some of the transcript predictions from annotations and sequencing data.

T-31: Advanced and reproducible bioinformatics approaches for high-throughput RNA-Seq data analyses
COSI: RNA COSI
  • Patrick Blumenkamp, Justus-Liebig University Giessen, Germany
  • Patrick Barth, Justus-Liebig University Giessen, Germany
  • Raphael Müller, Justus-Liebig University Giessen, Germany
  • Julian Winter, Justus-Liebig University Giessen, Germany
  • Alexander Goesmann, Justus-Liebig University Giessen, Germany

Short Abstract: The yearly increasing citations of DESeq2, edgeR, and limma (an increase of 535 % from 2015 to 2018) show that differential gene expression (DGE) analyses are still on an emerging path. The vast amount of data generated by current sequencing instruments underpins the need for automated and reproducible analysis pipelines. Thus, we developed a two-component software for analyzing and visualizing RNA-Seq data with a focus on DGE analyses. The first part is a modularised Snakemake pipeline generator consisting of quality-control, preprocessing, mapping, and in-depth analysis modules. The pipelines are built for high-throughput analyses and can be executed on local machines as well as on high-performance compute clusters. Each pipeline is entirely reproducible and the existing collection of modules, which are customizable and extendable, increase the flexibility of the pipeline generation. The second component is a dynamic HTML document for visualizing DGE results. All charts are interactive and can be saved in common image file formats. Both components combined create an environment that supports the full process of data analysis from the initial handling of RNA-seq raw data to the final DGE analyses and result visualization.

T-32: Learning to Fold RNAs in Linear Time
COSI: RNA COSI
  • F A Rezaur Rahman Chowdhury, Baidu Research, United States
  • Liang Huang, Oregon State University and Baidu Research USA, United States
  • He Zhang, Baidu Research USA, United States

Short Abstract: RNA secondary structure prediction is a well-studied problem with application to medical domain, and both physics-based models and machine learning-based models have been used to solve this problem. Compared to physics-based models, machine learning-based models learn feature weights from data and address the limitation of experimentally-measured thermodynamic parameters. However, the existing methods for training the machine learning-based models are still expensive due to their cubic-time inference algorithm, such as CONTRAfold and MXfold. Recently, LinearFold used left-to-right dynamic programming and beam search to predict RNA secondary structure in linear time. In this work, we incorporated LinearFold's efficiency with structured perceptron training algorithm to learn RNA secondary structure prediction model. Testing on a dataset with diverse RNA families, we showed that the training speed is 4 times faster than MXfold. Also, compared with CONTRAfold and MXfold, we showed that LinearFold with new learned feature weights is more accurate.

T-33: LinearCoFold: Two-Strand RNA Folding in Linear Time
COSI: RNA COSI
  • He Zhang, Baidu Research USA, United States
  • Liang Huang, Oregon State University and Baidu Research USA, United States

Short Abstract: Most ncRNAs function through RNA-RNA interactions. Fast and reliable RNA secondary structure prediction with consideration of RNA-RNA interaction is desired. Some existing tools, such as RNAhybrid and RNAduplex, are not only less informative but also less accurate due to omitting the competing between intermolecular and intramolecular base pairs. Another group of tools such as RNAup focus on predicting the binding region rather than predicting two-strand co-folding structure. Some other tools like RNAcofold are slow. We present LinearCoFold, which is able to predict pseudo-knot free co-folding structure in linear runtime and space. LinearCoFold is a global co-folding approach without restriction on base pair length, and can output both intermolecular and intramolecular base pairs. LinearCoFold extends LinearFold to two-strand co-folding by concatenating two interacting RNAs, and adopts a left-to-right dynamic programming (DP). This alternative DP fashion allows it to apply beam pruning heuristic to achieve performance improvement. LinearCoFold is 6 times faster than RNAcofold for the RNA-RNA complex with 6000+ nucleotides. Even using approximate search LinearCoFold is also more accurate compared with RNAcofold: overall PPV/Sensitivity increases by 0.99/3.39. LinearCoFold also significantly improves accuracy in longest families. For the longest family, PPV/Sensitivity is improved by 13.33/15.38, respectively.

T-34: sRNA Analysis with TEsmall: A holistic approach to differential expression
COSI: RNA COSI
  • Kathryn O'Neill, Cold Spring Harbor Laboratory, United States
  • Wen-Wei Liao, McDonnell Genome Institute, United States
  • Ami Patel, Icahn School of Medicine at Mount Sinai, United States
  • Molly Hammell, Cold Spring Harbor Laboratory, United States

Short Abstract: Micro RNAs(miRNAs) typically dominate small RNA transcriptomes, yet many other classes are present including tRNAs, snoRNAs, snRNAs, Y-RNAs, piRNAs, and siRNAs. Proportions of these molecules vary greatly by cell type, and interactions between processing machinery and targeting networks of these various small RNA classes remain unclear, largely because these classes are typically analyzed separately. Concurrent handling of sRNA classes facilitates analysis of the regulation of transposable elements which have been shown to be regulated by piRNAs, siRNAs and tRNAs. We present TEsmall, a tool for the simultaneous processing and analysis of sRNAs from each annotated class in a single integrated workflow. The pipeline begins with raw fastq reads and proceeds all the way to producing count tables formatted for differential expression analysis and several interactive figures to summarize length and annotation class distributions. Analysis with TEsmall in melanoma cell lines identified potential markers of resistance, and facilitated investigation of tRNA 3’ fragments mapping antisense to endogenous retroviruses, potentially serving as transposon-regulatory tRNA derived small RNAs (tRFs). We are currently implementing an expectation maximization algorithm to redistribute ambiguously mapped sRNAs to increase statistical power in differential analysis, and specific 3’ tRF handling to report tRNA of origin and putative transposon targets.

T-35: Trinity SuperTranscripts in the Cancer Transcriptome Analysis Toolkit
COSI: RNA COSI
  • Vrushali Fangal, Broad Institute of MIT and Harvard, United States
  • Brian Haas, Harvard University, United States

Short Abstract: The Trinity de novo RNA assembler is the basis of the Trinity Cancer Transcriptome Analysis Toolkit, targeting the study of cancer biology via transcriptome analysis. We extended its capabilities by incorporating a new data representation called SuperTranscript (ST) (Davidson et al., 2017) that facilitates gene-level analysis of assembled isoforms by combining all their exons in a single ST gene-like representation. STs can be used as a reference for differential transcript usage analysis and variant calling in a genome-free manner, accommodating the frequent genomic and transcriptomic rearrangements in cancer cells. Using the Trinity assembly graph to construct STs, rather than realigning transcripts as in the method paper, substantially reduced the compute time to build STs. Faithfulness to the input also improved as shown by the increased accuracy in the results, providing an effective genome-free way to explore gene-level analyses in cancer biology and non-model organisms.

T-36: inferCNV - predicting CNA from single cell tumor RNA-seq
COSI: RNA COSI
  • Christophe Georgescu, Harvard University, United States
  • Maxwell Brown, Harvard University, United States
  • Brian Haas, Harvard University, United States

Short Abstract: Chromosomal structural aberrations, including rearrangements and aneuploidy, are frequently associated with diseases such as cancer. Their detection can help guide patient treatment and prognosis. One class of structural variants often found in tumor genomes includes copy number alterations (CNAs), where the number of copies of a particular chromosomal region varies in genomes based on duplication or deletion events. While methods for exploring these abnormalities have been available for decades, tumors are a heterogeneous ecosystem of cells, including diverse malignant and non-malignant cells in their microenvironment, such as immune, stromal, or endothelial cells. With modern single-cell (sc) transcriptome sequencing technologies, the resolution at which tumor heterogeneity can be identified has been extended to the sc level and single nucleotide resolution. We implemented the tool ‘inferCNV’ for processing sc expression data to facilitate inference of CNAs through a range of visualizations and exploratory methods. We further integrated CNA prediction in inferCNV via a Hidden Markov Model (HMM) parameterized by sample-specific simulations of scRNA-Seq data including CNAs, enabling prediction of multiple discrete levels of amplification or deletion. A latent Bayesian network mixture model was implemented to combat possible false positives identified by the HMM along with giving posterior probabilities for HMM identified CNAs.

T-37: Fusion Transcript Prediction Accuracy Analysis Framework
COSI: RNA COSI
  • Brian Haas, Harvard University, United States
  • Alexander Dobin, Cold Spring Harbor Laboratory, United States

Short Abstract: Genomic rearrangements often fuse genes together in an unnatural context that can disrupt or alter gene functions. In the case of a fusion disrupting a tumor suppressor or activating an oncogene, the fusion gene can become a potent driver of cancer. Evidence for such gene fusions can be detected from transcriptome sequencing, leveraging RNA-seq with specialized software to search the sequencing data for evidence of chimeric gene products. Many algorithms and software tools have been developed over the last decade to leverage RNA-seq for fusion transcript detection, but there has remained much room for improvement in fusion prediction accuracy and runtime performance. Here we describe a framework for benchmarking fusion transcript prediction tools leveraging both simulated and genuine RNA-seq data, and evaluate prediction accuracy across over a dozen prediction methods. We highlight strengths, weaknesses, and areas for further development.

T-38: A Dimensional Reduced Model for the Classification of RNA-Seq Anopheles Gambiae Data
COSI: RNA COSI
  • Micheal Arowolo, Landmark University, Omu Aran, Nigeria
  • Roseline Ogundokun, Department of Computer Science, Landmark University, Omu-Aran, Nigeria
  • Marion Adebiyi, Department of Computer Science, Landmark University, Omu-Aran, Nigeria

Short Abstract: A significant application of gene expression RNA-Seq data is the classification and prediction of biological models. An essential component of data analysis is dimension reduction. This study presents a comparison study on a reduced data using PCA feature extraction dimension reduction technique and evaluates the relative performance of classification procedures of SVM kernel classification techniques, namely SVM-Polynomial kernels and SVM-Gaussian kernels. An accuracy and computational performance metrics of the processes were carried out. A malaria vector dataset for RNA-Seq classification was used in the study, and 99.68% accuracy was achieved in the classification output result.

T-39: A versatile web-accessible single cell RNA-seq processing platform
COSI: RNA COSI
  • Andreas Hoek, Justus Liebig University Giessen, Germany

Short Abstract: Single cell RNA-seq (scRNA-seq) enables analysis of cellular transcriptomes in an unprecedented resolution, allowing e.g., the identification of previously undiscovered rare cell populations with corresponding marker genes, detection of heterogeneity between cells of the same type or to follow transcriptional programs of cells during differentiation. Current high-throughput sequencing techniques yield enormous amounts of data that need to be analyzed. Hence, there is need for bioinformatic analysis solutions adapted to the specific challenges deriving from scRNA-seq data. Here we present a versatile application designed for the management, analysis and interpretation of scRNA-seq high-throughput data. The software addresses all aspects from initial quality control, demultiplexing and reference alignment to downstream statistical evaluation. Furthermore, it provides an automated workflow suitable for both non-bioinformaticians as well as experts. The software is available via a web-based interface and supports deployment to local as well as cloud-based compute infrastructures. Finally, the application has already successfully been applied to various data sets derived from, among others, mouse lung organoid cells.

T-40: snoDB: Visualizing the Human snoRNome
COSI: RNA COSI
  • B Philia, Université de Sherbrooke, Canada
  • Michelle Scott, University of Sherbrooke, Canada

Short Abstract: Small nucleolar RNAs (snoRNAs) are a family of highly structured small non-coding RNAs often nested within other genes(called host genes) and conserved across all eukaryotes. Their canonical function is to guide site specific modifications in pre-rRNA during ribosome biogenesis. A dozen snoRNAs have more recently been ascribed non-canonical functions including the regulation of gene expression, the mediation of oxidative stress as well as roles in various diseases. Emerging discoveries of novel targets, novel members and non-uniform expression patterns together indicate that the landscape of snoRNA cellular functionality is broader than it once seemed, with much left to uncover. To facilitate further characterization of human snoRNAs and in particular their non-canonical and emerging functions, we created snoDB, a comprehensive and interactive online database. snoDB brings together data that is currently scattered throughout the literature, in repositories of large-scale datasets, annotations and specialized databases. snoDB currently features data on predicted RNA-RNA interactions, conservation, annotations, host genes as well as our own RNA-seq expression data generated using a methodology suited to highly structured RNAs, in various tissues. Expression data can be selectively visualized in dynamic heatmaps and we are working on visualizing interactions in network form.

T-41: RNA Splicing Analysis for Large Heterogeneous Datasets
COSI: RNA COSI
  • Jorge Vaquero-Garcia, University of Pennsylvania, United States
  • Scott Norton, BioCiphers, United States
  • Nicholas Lahens, University of Pennsylvania, United States
  • Greg Grant, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States

Short Abstract: The ubiquitous usage of RNASeq has resulted in large scale datasets. These data, such as GTEX and TCGA, involve thousands of samples and are heterogeneous in nature. Capturing RNA splicing variations from such data pose algorithmic, computational and visualization challenges. To address these challenges we developed MAJIQ-HET which offers a unique, to the best of our knowledge, combination of features: Scaling to thousands of samples; Detection of de-novo (unannotated) junctions, exons, and intron retention events; Detection of complex splicing variations (involving more than 2 junctions); Built in interactive visualization; Built in connectivity to other tools including UCSC and automatic primer design for experimental validation (Green et al 2017). Using a new large scale “realistic” synthetic dataset as well as GTEX samples we demonstrate HET compares favourably to current state of the art for large scale analysis (rMATS, Leafcutter and SUPPA) in multiple metrics of accuracy. HET is also competitive in running time and memory usage while retaining unique features absent in other tools such as de-novo intron retention and built in correction for known and unknown confounding factors. Overall, MAJIQ-HET represents a significant advancement in our ability to accurately capture RNA splicing variations from large heterogeneous datasets.

T-42: Genome wide quantification of ADAR A-to-I RNA editing activity
COSI: RNA COSI
  • Shalom Hillel Roth, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Israel
  • Erez Y. Levanon, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Israel
  • Eli Eisenberg, Raymond and Beverly Sackler School of Physics and Astronomy and Sagol School of Neuroscience, Tel Aviv University, Israel

Short Abstract: Adenosine(A)-to-Inosine(I) (interpreted as guanosines (G)) RNA editing is a ubiquitous and critical RNA modification, catalyzed by the ADAR protein family in dsRNA. Editing affects coding sequences, mRNA processing and regulation, and inflammation of the tissue. Aberrant RNA-editing has recently been associated with cancer, autoimmune disorders such as Psoriasis and SLE, autism, and the efficacy of PDL-1 inhibitors. Thus, a global editing quantification is an important tool for the study of these conditions. As a typical example, over 90% of all human editing activity occurs within Alu regions, encompassing millions of sites, but with a very low mean editing level (< 1%). Therefore, quantification of editing rates at these sites requires an almost unachievable ultra-high coverage, such that de-novo detection schemes are biased, picking up only a random fraction of the editing signal. We created a publically available software package enabling a straight-forward calculation of global editing levels from raw RNA alignment files. We applied it to the GTEx project, analyzing global editing patterns for 8848 different tissues. We found it robust and consistent with respect to varying read length, coverage and alignment scheme. To demonstrate the versatility of this method, we adapted and used it with murine data as well.

T-43: Adjusting for known and unknown confounding factors in RNASeq based splicing analysis
COSI: RNA COSI
  • Barry Slaff, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States

Short Abstract: BACKGROUND Correcting for confounding factors when studying gene expression has received much attention. Remarkably, there are no equivalent methods for RNA splicing analysis. There is therefore a strong need for a method which integrates with existing splicing tools, scales to datasets with thousands of samples and millions of junctions, and produces adjusted counts enabling greater insights from exploratory analyses and differential splicing tests. RESULTS We developed MOCCASIN, a splicing confounding factor adjustment method. The method can correct for known and unknown confounding factors, scales to very large datasets, and produces adjusted counts which can be integrated with splicing quantification and visualization tools such as MAJIQ. Using RNASeq from hundreds of cancer samples (AML, ALL), we demonstrate the effectiveness of MOCCASIN and show it compares favorably with a procedure recently used by ENCODE on splicing data. We show that corrected splicing quantifications improve signal for both known (treatment) and unknown (gender) signals. CONCLUSIONS MOCCASIN is the first confounding factor correction method integrated with splicing quantification and analysis software (MAJIQ). The method successfully removes variation from known and unknown confounders, reduces false positives in differential splicing analysis, and scales to consortium-level RNA seq datasets such as TCGA and GTEX.

T-44: Chromatin-enriched RNAs mark both active and repressive cis-regulation: a computational analysis of nuclear RNA-seq
COSI: RNA COSI
  • Xiangying Sun, Purdue University, United States
  • Zhezhen Wang, The University of Chicago, United States
  • Carlos Perez-Cervantes, The University of Chicago, United States
  • Alex Ruthenburg, The University of Chicago, United States
  • Ivan Moskowitz, The University of Chicago, United States
  • Michael Gribskov, Purdue University, United States
  • Xinan Yang, The University of Chicago, United States

Short Abstract: Most long noncoding RNAs (lncRNAs) localize in the cell nucleus to influence important biological processes. Modern RNA sequencing of nuclei and/or their components has revealed cis-regulatory lncRNAs, including chromatin-enriched nuclear RNA (cheRNA) that is tightly bound to chromatin. However, a rigorous analytic pipeline for nuclear RNA-seq is lacking. In this study, we survey four computational strategies for nuclear RNA-seq data analysis and show outperformance of a new pipeline (Tuxedo) in complete transcriptome assembly and accurate cheRNA identification. Analyzing well-studied K562 datasets with the Tuxedo pipeline, we characterize genomic features of intergenic cheRNA (icheRNA) that is similar to those of enhancer RNA (eRNA). Moreover, we quantify the transcriptional correlation of icheRNA and adjacent genes, affirming that icheRNA is more positively associated with its neighbor gene in expression than the eRNA predicted by state-of-art methods or CAGE (cap analysis of gene expression) signals. We further propose icheRNA coincident with H3K9me3 marks as a very effective predictor for novel chromatin-based eRNA, and a potential cis-repressive function of antisense cheRNA (as-cheRNAs); these activities are likely to be involved in transiently modulating cell type-specific cis-regulation. These findings demonstrate a rigorous computational analysis of nuclear RNA-seq will shed new light on cis-regulation.

T-45: ExpressionAble: Making it easy to transform expression data from one file format to another
COSI: RNA COSI
  • Nathan Mella, Brigham Young University, United States
  • Brandon J Fry, Brigham Young University, United States
  • Stephen Piccolo, Brigham Young University, United States

Short Abstract: Scientists have generated millions of gene-expression profiles, using microarray and RNA- Sequencing assays. These experiments produce large, tabular datasets, which can be mined for biomedically relevant patterns. However, there are many different formats for storing the data, and analysis tools (such as R and Excel) do not support all these formats. Consequently, researchers spend an inordinate amount of time converting gene-expression files from one format to another. Also, researchers often wish to work with only a subset of genes or samples in a given dataset (specific rows and columns), but writing code to perform these steps for every file format is inefficient. To ease this process, we have developed an open-source tool called ExpressionAble. This tool enables researchers to quickly transform expression data from one format to another. It can be used either as a command-line tool or as a Python module for programmatic access. We have developed custom parsers to load data from their original source, including the file formats used by salmon, kallisto, STAR, and the GEO database. After importing the data into ExpressionAble, users can then select columns, merge files, filter samples, and export the data to 14 different tabular formats for downstream analyses.

T-46: High resolution analysis of functional regions of mRNA folding in protein-coding sequences across the tree of life
COSI: RNA COSI
  • Michael Peeri, Tel-Aviv University, Israel
  • Tamir Tuller, Tel Aviv University, Israel

Short Abstract: mRNA can form local structures and the local folding strength affects the interaction with the ribosome and is thought to influence many additional aspects of gene expression. However, the way evolution shapes local mRNA structure strength in coding sequences is still poorly understood. In this study, we performed the first analysis of selection on secondary-structure strength in the coding sequences of 513 species considering their phylogenetic relationships. We show that coding sequences in phyla from the three domains of life contain consistent regions of increased or decreased secondary-structure strength. These regions coincide with mRNA sections involved in different gene expression processes and in particular different stages of protein translation (initiation, elongation and termination), indicating folding strength has a role in these processes. The increase or decrease in secondary-structure strength in different parts of the coding sequence correlate with genomic and environmental traits and is disrupted in species expected to have weak selection for efficient gene expression (such as species with weak codon-bias or intracellular replication). Our results suggest that mRNA secondary-structure strength is maintained under selection to improve gene expression efficiency. This mechanism complements other synonymous features of the coding sequence to regulate mRNA concentrations and optimize the translation process.

T-47: Feature reduction of CRISPR-Cas9 on-target efficiency prediction improves the accuracy
COSI: RNA COSI
  • Jan Gorodkin, University of Copenhagen, Denmark
  • Guilia Corsi, University of Copenhagen, Denmark
  • Ferhat Alkan, University of Copenhagen, Denmark

Short Abstract: The CRISPR-Cas9 system has become a highly popular DNA scissor in genome editing. However, not all guide RNAs (gRNAs) cleave equally efficiently. Consequently, a wide variety of (machine learning) methods predicting the efficiency have been made, but they typically encode >600 features of the gRNA as input along with experimentally determined efficiencies as output. The features are typically derived directly from the primary sequence only, and potential relevant features from gRNA self-folding and binding interaction with the DNA are omitted, while redundant features can hamper the prediction accuracy. Here, we analyzed the 629 input features used to predict the cleavage efficiency of the Cas9-gRNA tool. We employed a five-fold cross-validation with an independent test set for different architectures of a gradient boosted tree and a feature elimination strategy to reduce the number of input features to 129 without performance loss. Adding five energy related features, yields a reduction to 98 features, while increasing the performance on the independent test set. When comparing the 15 highest weighted features among the trained models, known features like the cutting position are among these. Interestingly, our five energy related features are consistently weighted as top features in all five resulting models.

T-48: Multi-omics data integration uncovers candidate molecular biomarkers for minimal hepatic encephalopathy
COSI: RNA COSI
  • Teresa Rubio, Centro de Investigacion Principe Felipe, Spain
  • Carmina Montoliu, INCLIVA, Spain
  • Vicente Felipo, Centro de Investigacion Principe Felipe, Spain
  • Sonia Tarazona, Universitad Politécnica de Valencia, Spain
  • Ana Conesa, University of Florida, United States

Short Abstract: Increasingly multiomics approaches are used to search for candidate biomarkers of diagnosis and disease progression. However, multiomics assays are expensive and frequently pilot studies are first run, or the study can only be applied to a reduced number of individuals. Such scenario was present in a recent multiomics study of Minimal Hepatic Encephalopathy (MHE). MHE is a neurological syndrome affecting more than 2 million people in the EU, that produces mild cognitive impairment in cirrhotic patients. In order to get insights into the molecular mechanisms behind MHE, improve diagnosis and suggest potential therapeutic targets a multi-omics (transcriptomics, metabolomics and panel of interleukins) analysis from human peripheral blood cells was conducted on 10 individuals. We developed a 3 steps pipeline to analyze this data for biomarker identification. This approach combines univariate statistics, multivariate PLS, clustering, network analysis and database integration. Using these approach, we identified CCL20 and CX3CL1 biomarker compounds associated to genes involved in chemotaxis. Our results suggest an autoimmune response in peripheral blood against neural cell types that might migrate to neural tissue causing the cognitive decline. Our pipeline can be used in similar multiomics study design scenarios were multiple data is generated on a reduced set of indivuals.

T-49: Isoforms across single cells and brain cell types.
COSI: RNA COSI
  • Hagen Tilgner, Cornell University, United States

Short Abstract: We recently published distinct long-read isoform methods, including a) single-cell isoform RNA sequencing (ScISOr-Seq)1 and b) synthetic-long-read RNA sequencing (SLR-RNA-Seq)2. ScISOr-Seq operates on single-cell suspensions from bulk tissue, employs 3’end sequencing to determine the cell type of each single cell and then isoform sequencing (PacBio or Oxford Nanopore) to determine the complete isoforms of single cells and cell populations. SLR-RNA-seq determines full-length sequences of millions of single molecules using deep short-read sequencing of very few molecules (which statistically almost certainly originate from different genes) at a time. Importantly SLR-RNA-seq can work from less than 1 nanogramm of material, thus requiring much less PCR. Here, I will describe so far unpublished applications of these technologies in the mammalian brain. I will report on the coordination of RNA processing events that do not involve RNA splice sites, in single cells, cell lines and central nervous system cell types. Furthermore, our newer datasets reveal an order of magnitude more cell-type specific isoform expression patterns than our previous datasets. Thus cell-type specific isoforms will be a more easily addressed topic in the near future. 1. Gupta*,Collier* et al., Nature Biotechnol, 2018 2. Tilgner*, Jahanbani* et al. Nature Biotechnol, 2015

T-50: Reference-free transcriptome assembly of nanopore RNA-seq data
COSI: RNA COSI
  • Chen Yang, BC Cancer Genome Sciences Centre, Canada
  • Saber Hafezqorani, BC Cancer Genome Sciences Centre, Canada
  • Ka Ming Nip, BC Cancer Genome Sciences Centre, Canada
  • Rene Warren, BC Cancer, Genome Sciences Centre, Canada
  • Inanc Birol, BC Cancer Genome Sciences Centre, Canada

Short Abstract: In recent years, there has been a growth in the number of sequence assembly solutions for nanopore sequencing data. These methods are designed for genome assembly and do not work well with RNA-seq data, if at all. Existing methods for transcriptome assembly with nanopore RNA-seq reads are either reference-guided or intended for assembling a hybrid of long and short reads. To fill this gap, we introduce a de novo assembly method that only uses nanopore RNA-seq data. In our approach, reads are first corrected for errors and then stratified by length and k-mer coverage. Then, error-corrected reads are retrieved from each stratum for clustering into groups of reads that belong to the same gene. Finally, the reads in each cluster are assembled into transcript isoforms. Using a simulated mouse transcriptome dataset, we show that our method was able to correct a significant proportion of errors in the nanopore reads and then assemble full-length isoforms from clusters of reads predominantly representing single genes. Since our method does not rely on a reference for transcript sequence reconstruction, it sets up the groundwork for large-scale comparative transcriptomics where high-quality draft genome assemblies are not readily available.

T-51: Is this the “end”? Termin(A)ntor: Transcriptome annotation with deep learning
COSI: RNA COSI
  • Chenkai Li, BC Cancer Genome Sciences Centre, Canada
  • Chen Yang, BC Cancer Genome Sciences Centre, Canada
  • Ka Ming Nip, BC Cancer Genome Sciences Centre, Canada
  • Rene Warren, BC Cancer, Genome Sciences Centre, Canada
  • Inanc Birol, BC Cancer Genome Sciences Centre, Canada

Short Abstract: Recent advances in high-throughput sequencing technologies have enabled comprehensive transcriptome analysis with base-level resolution. However, intrinsic biases, such as GC content and PCR cycles, make it challenging to assemble and annotate the 5’ and 3’ ends of transcript isoforms. Consequently, reconstructed transcript sequences may be incomplete, potentially missing untranslated regions, resulting in unreliable functional and regulatory analyses. Thus, it is desirable to have a completeness annotation pipeline for assembled transcripts and to incorporate such into RNA-seq analysis routines. Here we present Termin(A)ntor, a transcript annotation utility that is built upon two deep neural network classifiers. With as short as 20 bp sequence from the 5’ or 3’ end of assembled transcript sequence, our models achieve >81% and >87% annotation accuracy of the 5’ transcription start site and 3’ polyadenylation site (Poly(A) site), respectively. Through benchmarking the Poly(A) site prediction performance on two human RNA-seq samples, Termin(A)ntor demonstrates both higher sensitivity and precision than state-of-the-art methods. We performed cross-species experiment to show the capability of Termin(A)ntor to accurately annotate species without a good reference annotation or genome.

T-52: RNA-seq methodological landscape : the ignored importance of the choice of genome annotations
COSI: RNA COSI
  • Joël Simoneau, Université de Sherbrooke, Canada
  • Simon Dumontier, Université de Sherbrooke, Canada
  • Ryan Gosselin, Université de Sherbrooke, Canada
  • Michelle Scott, University of Sherbrooke, Canada

Short Abstract: The process of transforming RNA-seq sequencing data into meaningful quantification of gene features can be decomposed in a series of defined steps. Hundreds of different software and several different biological resources exist to fulfill the different steps. Therefore, users must define a set of steps (e.g. trimming, alignment and quantification software, in addition to a genome assembly and annotation) to process RNA-seq datasets. However, users cannot currently rely on an extensive assessment of the importance of every design choices to create their own suitable analysis. Our objective is to characterize the relative importance of each methodological step in RNA-seq on gene quantification. First, the current usage of software, genome and genomic annotation was characterized throughout the literature by performing a methodological review. Second, using different permutations of software and references highlighted by the methodological review, we explored the biases of the steps using statistical approaches. Through a methodological review and statistical analyses of RNA-seq data, we show that the choice of genome annotation not only has the biggest impact of gene quantification, it is also the least well-described design choice in the literature. We believe that the importance of genome annotation in quantification has been underestimated and not thoroughly characterized.

T-53: Novel bioinformatics tools to assess the functional impact of alternative isoform usage
COSI: RNA COSI
  • Francisco Pardo Palacios, Centro de Investigación Príncipe Felipe (CIPF), Spain
  • Lorena de La Fuente Lorente, Centro de Investigación Príncipe Felipe (CIPF), Spain
  • Pedro Salguero, Centro de Investigación Príncipe Felipe (CIPF), Spain
  • Cristina Marti, CIPF, Spain
  • Manuel Tardaguila, Sanger Institute, United Kingdom
  • Hector del Risco, University of Florida, United States
  • Ana Conesa, University of Floria, United States

Short Abstract: Post-transcriptional mechanisms such as Alternative Splicing (AS) and Alternative Polyadenylation (APA) regulate the maturation of pre-mRNA molecules and may result in different transcripts arising from the same gene. AS and APA increase of diversity and regulation capacity of transcriptomes and proteomes. AS and APA has been extensively characterised at the mechanistic level but to a lesser extent in terms of functional impact. User-friendly tools for functional profiling at isoform resolution are missing, limiting our capacity for investigating the functional consequences of posttranscriptional RNA maturation.We have developed a novel analysis framework for functional iso-transcriptomics consisting of a pipeline for the isoform-resolved functional annotation (IsoAnnot) and a user-friendly software to analyse the potential impact of AS and APA (tappAS). IsoAnnot incorporates more than 15 funcional databases while tappAS implements novel algorithms to interrogate the interplay between AS and function. We applied these methods to characterise the iso-transcriptome in neural differentiation and in plant tissues. Our analysis framework reveals the functional motifs differentially included in isoforms to regulate the function of specific biological processes and offers new venues to investigate the functional consequences of post-transcriptional regulation.

T-54: miRCoop: Identifying Cooperating Pairwise miRNAs \\ via Kernel Based Interaction Test
COSI: RNA COSI
  • Oznur Tastan, Sabanci University, Turkey
  • Gulden Olgun, Bilkent University, Turkey

Short Abstract: MicroRNAs(miRNAs) are small non-coding RNAs that regulate gene expressions post-transcriptionally by binding the complementary sequence of their target messenger RNAs(mRNAs). Recent studies reveal that miRNA pairs can repress the translation of target mRNA in a synergistic fashion; when bound together they induce a stronger down-regulation of their target mRNA. Our knowledge of on synergistic miRNA pairs is very limited. In order to identify the cooperative miRNA pairs, we propose a new method: miRCoop. miRCoop makes use of the miRNA – mRNA target prediction tools to find miRNA pairs that are predicted to target the same mRNA with non-overlapping binding sites. Using these potential triplets, we conduct kernel-based statistical interaction tests on the expression profiles of miRNAs and mRNAs to identify triplets for which the miRNAs’ expressions are statistically independent from the mRNA’s expression when taken individually but are dependent when taken together. We apply miRCoop on kidney cancer patient expression profiles. When applied to kidney cancer patient expression profiles, we find 503 potentially cooperative miRNA:miRNA:mRNA interactions. Several of these miRNAs are regulated with the same transcription factor. Furthermore, there are pairs clustered on the genome. We hope miRCoop will facilitate the mapping of the miRNA functional landscape.

T-55: Global analysis of human mRNA folding demonstrates significant population constraint of disruptive synonymous variants
COSI: RNA COSI
  • Peter White, The Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • Jeffrey Gaither, The Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • David Gordon, The Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • Grant Lammi, The Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • Blythe Moreland, The Institute for Genomic Medicine at Nationwide Children's Hospital, United States

Short Abstract: Current guidelines for variant interpretation classify most synonymous variants (sSNVs) as benign. RNA folding studies suggest mRNA secondary structure is essential for transcription and translation, yet the potential for pathogenic sSNVs impacting RNA folding in human disease is largely unknown. We therefore set out to test the hypothesis that sSNVs predicted to disrupt RNA stability would show significant constraint in the human population. We performed a systematic study of SNPs impacting RNA stability. First, we developed novel cloud-based software using Apache Spark, deriving RNA folding metrics for every possible polymorphism in the human transcriptome (~0.5 billion variants). Second, we utilized population allele frequencies to determine if highly disruptive SNP mRNA folding values were constrained. Third, these metrics were utilized to construct a Structural Predictivity Index (SPI score). We observed that sSNVs predicted to disrupt mRNA structure are highly constrained, supporting the hypothesis for their role in human genetic disease. To our knowledge, SPI is the first metric of its kind to allow assessment of sSNVs. Given that ~75% of rare disease patients have no clinically relevant finding using current variant interpretation approaches that ignore sSNVs, SPI has the potential to enable discovery of new pathogenic variants that impact RNA stability.

T-56: ShiRlOc: A robust computational approach to analyze Polysome Profiling RNA-Seq Data
COSI: RNA COSI
  • Charles Blatti, University of Illinois at Urbana-Champaign, United States
  • Mikel Heranez, University of Illinois, at Urbana-Champaign, United States
  • Waqar Arif, University of Illinois at Urbana-Champaign, United States
  • Auinash Kalsotra, University of Illinois at Urbana-Champaign, United States

Short Abstract: There has been a growing interest in understanding the regulation of translating mRNAs within a cell, or the translatome. Recently researchers have coupled RNA-Seq with polysome profiling, a well-established technique in which intact mRNA can be fractionated based on the number of associated ribosomes, or its ribosomal occupancy. Data obtained from this method provides transcriptome-wide view of translating mRNA and has the potential of shedding light into mechanisms of regulation. However, a robust computational pipeline to analyze this data has been lacking. Previous approaches have utilized clustering techniques on normalized read count to identify transcripts with differential ribosomal association. A major drawback of this method is the lack of statistical testing to discriminate transcripts with significant differences. In the present work, we propose a robust computational approach for the analysis of Polysome Profiling RNA-Seq data and identification of transcripts exhibiting translation control. We call our pipeline Shirloc or Shifts in Ribosomal Occupancy. Utilizing publicly available datasets, we have found our methodology is able to identify thousands of transcripts with varying degrees of ribosomal occupancy. However, our pipeline has also revealed that a significant portion of expressed transcripts display large variability and their relative ribosomal occupancy cannot be confidently determined.

T-57: scSLAM-seq and GRAND-SLAM reveal core features of CMV-induced regulation in single cells
COSI: RNA COSI
  • Florian Erhard, Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, Germany

Short Abstract: Current single-cell RNA sequencing (scRNA-seq) approaches analyze total RNA profiles at a single time point but convey little information about the underlying temporal dynamics. Thus, (i) responses to perturbations cannot be measured directly, (ii) kinetics of transcription cannot be investigated, (iii) short-term changes due to a perturbation within a timescale of a few hours are masked by pre-existing RNA and (iv) changes in RNA synthesis and decay cannot be differentiated. We present single-cell SLAM-seq (scSLAM-seq), which integrates metabolic RNA labeling, biochemical nucleoside conversion and scRNA-seq to record transcriptional activity. A new computational approach (GRAND-SLAM) that we recently developed allowed us to precisely quantify both new and old RNA for thousands of genes in hundreds of individual cells. We applied scSLAM-seq to the initial response to lytic cytomegalovirus (CMV) infection. Our data allowed us to perform dose-response analyses at the single cell level. Most of the variability of infection efficacy as well as the interferon and NF-kB responses is due a combination of the cell cycle state at the time of infection and the infection dose. Moreover, scSLAM-seq visualizes transcriptional bursts. We show that these are associated with promoter-intrinsic features indicating that DNA methylation renders promoters non-permissive in between transcriptional bursts.

T-58: Rfam: the database of 3,000+ non-coding RNA families
COSI: RNA COSI
  • Ioanna Kalvari, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Joanna Argasinska, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Eric P. Nawrocki, National Center for Biotechnology Information, National Library of Medicine, United States
  • Robert D. Finn, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Alex Bateman, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Anton I. Petrov, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom

Short Abstract: Rfam (http://rfam.org) is the database of non-coding RNA (ncRNA) families, with each family represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. These statistical models are used to annotate nucleotide sequences with ncRNAs using the Infernal software. Since its initial release, Rfam grew to 3,016 families from 22 RNA types. To refine the genome-centric approach, our collection of non-redundant, complete genomes was expanded to include 14,451 species from all domains of life. The latest release, Rfam 14.1, contains 226 new families and supports RNAcentral identifiers in seed alignments, that enabled us to create ~200 RNA families from metagenomic datasets. Using the latest version of R-scape, the Rfam secondary structures now display pseudoknots (manually annotated or R-scape predicted), also searchable using the new text search. We plan to review pseudoknot annotations by adding pseudoknots to the consensus secondary structures where possible. To speed-up family creation, we are implementing a new cloud-based pipeline allowing pre-approved users to build RNA families. As an additional incentive for contributing families to Rfam, we integrated with the ORCiD system so an Rfam family can be added to its author’s ORCiD profile. The new pipeline will be publicly available in late 2019.

T-59: Transcription in mitochondria of Trypanosoma brucei
COSI: RNA COSI
  • Ruslan Afasizehv, Boston University, United States

Short Abstract: Digenetic hemoflagellate Trypanosoma brucei belongs to Kinetoplastea, a taxonomic class defined by possession of a kinetoplast. Decades of kDNA studies unraveled fascinating phenomena of general biological significance, such as DNA bending and mRNA editing, and revealed exquisite details of genome replication, and RNA processing and translation. However, the mechanisms of transcription remain virtually unexplored. We present evidence that individual maxicircle protein-coding genes are independently transcribed into 3′ extended precursors. The transcription-defined 5′ terminus is converted into monophosphorylated state by the pyrophosphohydrolase complex, termed the PPsome. Composed of MERS1 NUDIX enzyme, MERS2 pentatricopeptide repeat RNA binding subunit, and MERS3 polypeptide, the PPsome binds to specific sequences near mRNA 5′ termini. Most guide RNAs lack PPsome recognition sites and remain triphosphorylated. RNA editing substrate binding complex (RESC) stimulates MERS1 hydrolase activity and enables an interaction between the PPsome and the polyadenylation machinery. We provide evidence that both 5′ pyrophosphate removal and 3′ adenylation are essential for mRNA circularization, a molecular basis of mitochondrial mRNA stability. Furthermore, we uncover a mechanism by which antisense RNA-controlled 3′-5′ exonucleolytic trimming defines the mRNA 3′-end prior to adenylation. These findings introduce a concept of mitochondrial gene-specific transcriptional control with broad implications in developmental transitions and pathogenesis.

T-60: RNA quality control in mitochondria of Trypanosoma brucei
COSI: RNA COSI
  • Inna Afasizheva, Boston University, United States

Short Abstract: Most mitochondrial mRNAs in Trypanosoma brucei undergo massive U-insertion/deletion editing to create open reading frames. Here, we report recent advances in understanding the polyadenylation-based surveillance mechanisms that ensure translation of correctly edited mRNAs. Addition of short 3′ A-tail by mitochondrial KPAP1 poly(A) polymerase prior to editing protects mRNA from 3′-5′ degradation during the editing process. Conversely, completion of editing is manifested by A-tail extension into long A/U-heteropolymer. The distinct roles and editing-dependent temporal separation of A-tailing and A/U-tailing events imply existence of sequence-specific factors that sense the mRNA’s editing status and regulate 3′ additions. We identified pentatricopeptide-repeat containing (PPR) RNA binding proteins responsible for monitoring mRNA editing status, 3′ modifications, and direct binding to the ribosome. We show that Kinetoplast Polyadenylation Factor 3 (KPAF3) specifically recognizes 3′ end of pre-edited transcripts thereby stabilizing mRNAs, and stimulates polyadenylation. Initiation of editing displaces KPAF3 leaving mRNA reliant on short A-tail as stability determinant. We further show that Kinetoplast Polyadenylation Factor 4 (KPAF4) recognizes a stretch of five adenosines acting as poly (A) binding protein. The latter prevents translational activation of partially-edited mRNAs. Collectively, our findings reveal previously unappreciated roles of PPR proteins as polyadenylation factors, and poly(A) binding, and ribosomal proteins.

T-61: Exploring the X-Chromosome Inactivation (XCI) process with single-allele resolution
COSI: RNA COSI
  • Guido Pacini, Max Planck Institute for Molecular Genetics, Germany

Short Abstract: In mammals, dosage compensation between the sexes is achieved through a process known as X-chromosome inactivation (XCI). Each cell of the female embryo silences one randomly chosen X chromosome. The inactive X chromosome (Xi) will maintain its silenced state in all daughter cells. Notably the up-regulation of the long non-coding RNA Xist from the Xi initiates XCI and mediates gene silencing in cis. To investigate random XCI in its endogenous context we use allele-specific single-cell transcriptomics in differentiating hybrid mouse embryonic stem cells at different time points during differentiation. The high number of polymorphisms between the two parental strains (B6 and Castaneous) provides allele-specific resolution for a large number of genes. A large fraction of cells initially up-regulates Xist from both chromosomes, which partially silences the chromosome and is then resolved to a mono-allelic state. The two chromosomes have different silencing dynamics. While most X-linked genes show similar kinetics, other escape silencing in a single allele. Differential expression analyses comparing cells with the highest and lowest Xist expression levels identify a set of putative Xist regulators, among which many known regulators are present. Our analysis provides a detailed picture of random X-inactivation at the single cell and single allele levels.

T-62: A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model
COSI: RNA COSI
  • Manato Akiyama, Keio University, Japan
  • Kengo Sato, Keio University, Japan
  • Yasubumi Sakakibara, Keio University, Japan

Short Abstract: A popular approach for predicting RNA secondary structure is the thermodynamic nearest-neighbor model that finds a thermodynamically most stable secondary structure with minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning-based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning-based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such a model has been reported. In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning-based weighted approach. Our fine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the ℓ1 regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed. The implementation of our algorithm is available at https://github.com/keio-bioinformatics/mxfold.

T-63: miRNAmotif – pre-miRNA interactions with protein in physiological conditions and cancer
COSI: RNA COSI
  • Martyna Urbanek-Trzeciak, Institute of Bioorganic Chemistry PAS, Poland

Short Abstract: Most human miRNAs are produced from primary precursors using canonical protein machinery, which includes DROSHA and DICER RNases. However, multiple other regulatory proteins that bind directly to distinct miRNA precursors are involved in miRNA biogenesis. Here, we present two exemplary usages of miRNAmotif software that enables analysis of miRNA precursors containing known sequence motifs that can be recognized by diverse RNA-binding proteins. First, we searched for miRNA precursors that contain motif recognized by Lin28 (GAGG), protein highly expressed in testis and placenta. We found 155 such pre-miRNAs, including known interactors: let-7 family and mir-9-1. We confronted the results with a miRNA tissue-specific expression database - miRmine, and found 40 pre-miRNAs expressed in testis and 36 in placenta, resulting in 43 unique pre-miRNAs that are potentially regulated by Lin28. We employed miRNAmotif software also to investigate whether pre-miRNA interactions with known RNA-binding proteins may be affected by somatic mutations identified in miRNA genes in two types of lung cancers. The analysis led to the identification of 84 mutations disrupting or creating motifs. The most frequently affected sequence motifs were UGU and VCAUCH recognized by the DGCR8 and DDX17, respectively. Code: www.github.com/martynaut/mirnamotif Webserver: http://mirnamotif.ibch.poznan.pl Funding: Polish National Science Centre [2016/22/A/NZ2/00184, 2015/17/N/NZ3/03629]

T-64: RNAs associated with presence of circulating tumor cells with mesenchymal phenotype in primary breast cancer tumours
COSI: RNA COSI
  • Dominik Hadzega, Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia, Slovakia
  • Marian Karaba, National Cancer Institute, Bratislava, Slovakia, Slovakia
  • Gabriel Minarik, Institute of Molecular Biomedicine, Faculty of Medicine, Comenius University, Bratislava, Slovakia, Slovakia
  • Juraj Benca, National Cancer Institute and Department of Medicine, St. Elizabeth University, Bratislava, Slovakia, Slovakia
  • Tatian Sedlackova, Institute of Molecular Biomedicine, Faculty of Medicine, Comenius University, Bratislava, Slovakia, Slovakia
  • Jan Macuch, National Cancer Institute, Bratislava, Slovakia, Slovakia
  • Gabriela Sieberova, National Cancer Institute, Bratislava, Slovakia, Slovakia
  • Daniel Pindak, National Cancer Institute and Slovak Medical University, Bratislava, Slovakia, Slovakia
  • Katarina Kalavska, National Cancer Institute & 2nd Department of Oncology, Faculty of Medicine, Comenius University, Bratislava, Slovakia, Slovakia
  • Jozef Mardiak, National Cancer Institute & 2nd Department of Oncology, Faculty of Medicine, Comenius University, Bratislava, Slovakia, Slovakia
  • Lubos Klucar, Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia, Slovakia
  • Michal Mego, National Cancer Institute, Bratislava, Slovakia, Slovakia

Short Abstract: Circulating tumor cells (CTCs) are cells found in blood of cancer patients, which are connected to more dangerous diagnosis, known to be key component of metastatic cascade. We studied primary tumors of breast cancer from 72 patients. Aim of the study was to identify genes and pathways associated with presence of CTCs with mesenchymal phenotype in primary breast cancer. Genes and micro RNAs expression levels were obtained by SurePrint G3 Human Gene Expression v3 and Human microRNA Microarray v21.0 (both Agilent Technologies). We performed statistical analysis of two phenotypic groups by limma package in R. Both groups were tumor data of breast cancer patients, one with mesenchymal CTC present in blood and second without CTCs. We identified 235 genes that were expressed at significantly different levels in tumors with presence of CTC EMT in patient’s blood compared to tumors of patients without detectable CTCs. 171 miRNAs were differentially expressed only under less strict result-filtering conditions, suggesting less important role of miRNAs for studied processes. After analysis of microarrays, we searched for overrepresented ontologies and identified pathways related to cadherin as the most significant result. This work was supported by grant APVV-16-0010.

T-65: DIANA-mAP: An Automated pipeline for the quantification of microRNAs
COSI: RNA COSI
  • Athanasios Alexiou, University of Thessaly, Greece
  • Dimitrios Zisis, Hellenic Pasteur Institute, Greece
  • Ioannis Kavakiotis, University of Thessaly, Greece
  • Antonis Koussounadis, DIANA-Lab, Department of Electrical & Computer Engineering, University of Thessaly, 382 21, Volos, Greece, Greece
  • Dimitra Karagkouni, DIANA-Lab, Department of Electrical & Computer Engineering, University of Thessaly, 382 21, Volos, Greece, Greece
  • Artemis Hatzigeorgiou, University of Thessaly, Greece

Short Abstract: Next-Generation Sequencing (NGS) technologies, have led to inexpensive data prod¬uction transforming and affecting every research aspect in the fields of biology and medicine. Appropriate pre-processing of NGS data is the most important prerequisite task in almost all data-driven biological and biomedical studies. microRNAs are short (∼23 nt) single-stranded noncoding RNA molecules that post-transcriptionally regulate gene expression, through target cleavage, degradation and/or translational suppression. We developed DIANA-mAP, a fully automated computational pipeline with an emphasis to pre-processing, that allows the user to perform microRNA NGS analysis from raw data to quantification and differential expression in an easy, scalable, efficient, and intuitive way. DIANA-mAP, can access and download publicly available datasets from online repositories and perform a sequence of mandatory steps (i.e. quality contro¬l, adapter and quality trimming, genome alignment etc) in order to produce high quality datasets for downstream data mining and statistical analysis. It has been implemented to be fully automated, parallelizable and will be offered standalone or dockerized with no dependency installations.

T-66: Transcriptome analysis of chronic lymphocytic leukemia reveals intron retention as a common mechanism regulating B cell receptor signaling
COSI: RNA COSI
  • Murat Iskar, German Cancer Research Center, Germany

Short Abstract: The splicing machinery is frequently aberrant in chronic lymphocytic leukemia (CLL). In this study we systematically characterized the cancer-specific patterns in RNA splicing to uncover the role of spliceosome dysregulation in CLL malignancy. For this purpose, we analyzed deep total strand-specific RNA-seq and matched epigenomic data derived from 19 CLL patients and 7 healthy donors. Our transcriptome-wide analysis reveals a dramatic decrease in intron retention levels in CLL cells, indicating an increased splicing efficiency in CLL compared to CD19-sorted B cells. After excluding divergent splicing events associated with B cell differentiation, we identified 1051 differentially retained introns specific to CLL using IRFinder tool. Pathway analysis showed that the affected genes are involved in several pathways critical for CLL survival such as B-cell receptor signaling. We performed an integrative analysis of RNA splicing patterns and found transcriptome-wide enrichment of intron retention at skipped exons, likely modulating the protein isoforms of various genes. DNA sequence characterization of splice sites revealed higher GC-content for differentially retained introns. We found that a significant number of CLL-specific post-transcriptional dysregulation could be reversed by inhibiting the splicing factor SF3B1. Taken together, this study proposes intron retention as a widespread mechanism diversifying the CLL transcriptome towards pathogenesis.

T-67: CloseCall: a novel pipeline to identify RNA-RNA interactions
COSI: RNA COSI
  • Steven Wingett, The Babraham Institute, United Kingdom
  • Jorg Morf, The Babraham Institute, United Kingdom
  • Simon Andrews, The Babraham Institute, United Kingdom

Short Abstract: Proximity RNA-seq is a new technique to elucidate cellular three-dimensional RNA organisation. Previously, pairwise analysis of RNA-RNA interactions has been restricted to direct base-paired contacts or to short-range distances between ligated RNA ends. In contrast, our new method identifies associations between pairs or groups of transcripts, irrespective of the mechanism holding the molecules together. Proximity RNA-seq uses massive-throughput RNA barcoding of sub-nuclear particles in water-in-oil emulsion droplets, followed by Illumina sequencing. Our studies revealed a bipartite organisation of nuclear RNAs, in which transcript families – of varying tissue-specificity, speed of RNA polymerase elongation and levels of differential splicing – were positioned in discrete compartments. The experimental innovations were developed in unison with a new bioinformatics pipeline, named CloseCall. This new software, written in Perl and Java, was customised to meet the unique requirements of proximity RNA-seq. The pipeline performs many tasks, including pooling co-barcoded transcripts, allocating reads to a specialised genome annotation and identifies statistically significant interactions using a Monte Carlo simulation. The software is freely available under an Open Source License.

T-68: Inferring RNA-binding sites, motifs and regulation from CLIP data with RCRUNCH
COSI: RNA COSI
  • Maria Katsantoni, University of Basel and Swiss Institute of Bioinformatics, Switzerland
  • Erik van Nimwegen, University of Basel and Swiss Institute of Bioinformatics, Switzerland
  • Mihaela Zavolan, University of Basel and Swiss Institute of Bioinformatics, Switzerland

Short Abstract: RNA binding proteins (RBPs) are essential for various processes in the eukaryotic cell. Currently, the method for identifying RNA targets of a specific RBP is CLIP (Cross Linking ImmunoPrecipitation, Ule, J., et. al., 2005). Methods developed for analysis of ChIP (chromatin-IP) data could be applicable to CLIP, since the basic principle is the same. One of these is CRUNCH (Completely Automated Analysis of ChIP-seq Data, Berger, S., et. al., 2016), which uses an innovative approach to model sampling errors. We decided to make use of this model in the context of eCLIP, a variation of the CLIP basic protocol widely used by the ENCODE consortium (Van Nostrand, Eric L., et al., 2016). We present RCRUNCH, an automated tool currently optimised for the analysis of eCLIP data. It consists of automated preprocessing of the reads, modeling of experimental errors and postprocessing. De novo motif prediction is also performed, using PhyloGibbs (Siddharthan, R., et. al., 2005). RCRUNCH is developed as a snakemake pipeline (Köster, J., & Rahmann, S. 2012), which is easy to use, requires minimal intervention from the user and handles RBP binding in both mature and pre-mRNAs, which has not commonly been done so far.